作者:lifetime8_797 | 来源:互联网 | 2023-05-17 22:45
Iamusinglxmlsxpathfunctiontoretrievepartsofawebpage.Iamtryingtogetcontentsofa&l
I am using lxml's xpath function to retrieve parts of a webpage. I am trying to get contents of a
tag, which includes html tags of its own. If I use
我正在使用lxml的xpath函数来检索网页的各个部分。我试图获取标签的内容,其中包含自己的html标签。如果我使用
//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]
I get the right amount of nodes, but they are returned as lxml objects (
).
我获得了正确数量的节点,但它们作为lxml对象返回( <元素字体位于0x101fe5eb0> )。
If I use
如果我使用
//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/text()
I get exactly what I want, except that I don't get any of the HTML code which is contained within the
nodes.
我得到了我想要的,除了我没有得到节点中包含的任何HTML代码。
If I use
如果我使用
//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/node()
if get a mixture of text and lxml elements! (e.g. something something something
)
如果得到文本和lxml元素的混合! (例如某事 <元素a在0x102ac2140> 某事)
Is there anyway to use a pure XPath query to get the contents of the
nodes, or even to force lxml to return a string of the contents from the .xpath()
method, rather than an lxml object?
无论如何使用纯XPath查询来获取节点的内容,甚至强制lxml从.xpath()方法返回内容的字符串,而不是lxml对象?
Note that I'm returning a list of many nodes from the XPath query so the solution needs to support that.
请注意,我正在从XPath查询返回许多节点的列表,因此解决方案需要支持该节点。
just to clarify... i want to return something something inside something
from something like...
只是为了澄清...我想要回复一些东西里面的东西......
inside something
2 个解决方案